Comparisons on Different Approaches to Assign Missing Attribute Values
نویسندگان
چکیده
A commonly-used and naive solution to process data with missing attribute values is to ignore the instances which contain missing attribute values. This method may neglect important information within the data, significant amount of data could be easily discarded, and the discovered knowledge may not contain significant rules. Some methods, such as assigning the most common values or assigning an average value to the missing attribute, may make good use of all the available data. However the assigned value may not come from the information which the data originally derived, thus noise is brought to the data. We introduce a new approach RSFit on processing data with missing attribute values based on rough sets theory. By matching attribute-value pairs among the same core or reduct of the original data set, the assigned value preserves the characteristics of the original data set. We compare our approach with “closest fit approach globally” and “closest fit approach in the same concept”. Experimental results on UCI data sets and a real geriatric care data set show our approach achieves comparable accuracy on assigning the missing values while significantly reduces the computation time.
منابع مشابه
A Comparison of Several Approaches to Missing Attribute Values in Data Mining
In the paper nine different approaches to missing attribute values are presented and compared. Ten input data files were used to investigate the performance of the nine methods to deal with missing attribute values. For testing both naive classification and new classification techniques of LERS (Learning from Examples based on Rough Sets) were used. The quality criterion was the average error r...
متن کاملMining Incomplete Data with Many Missing Attribute Values A Comparison of Probabilistic and Rough Set Approaches
In this paper, we study probabilistic and rough set approaches to missing attribute values. Probabilistic approaches are based on imputation, a missing attribute value is replaced either by the most probable known attribute value or by the most probable attribute value restricted to a concept. In this paper, in a rough set approach to missing attribute values we consider two interpretations of ...
متن کاملA comparison of traditional and rough set approaches to missing attribute values in data mining
Real-life data sets are often incomplete, i.e., some attribute values are missing. In this paper we compare traditional, frequently used methods of handling missing attribute values, which are based on preprocessing, with another class of methods dealing with missing attribute values in which rule induction is performed directly on incomplete data sets, i.e., handling missing attribute values a...
متن کاملCost Efficiency Measures In Data Envelopment Analysis With Nonhomogeneous DMUs
In the conventional data envelopment analysis (DEA), it is assumed that all decision making units (DMUs) using the same input and output measures, means that DMUs are homogeneous. In some settings, however, this usual assumption of DEA might be violated. A related problem is the problem of textit{missing} textit{data} where a DMU produces a certain output or consumes a certain input but the val...
متن کاملA Closest Fit Approach to Missing Attribute VAlues in Preterm Birth Data
Recently, results on a comparison of seven successful methods of handling missing attribute values were reported. This paper describes experimental results on the three most successful methods out of these seven. Two of these methods, based on a Closet Fit idea (searching in a remaining data set for the closest fit case and replacing a missing attribute value by the corresponding known value fr...
متن کامل